Exploration of Premier League SPI Ratings
Motivation
Being avid viewers of the English Premier League, the top flight of English soccer and arguably the most competitive soccer league in the world, we were interested in exploring how the dominance of some teams in the league can be explained as well as how financial means contributes to this dynamic. Looking at recent articles published by the popular sports news outlet The Athletic, we found our inspiration for exploring these relationships through data. The articles we found look at how financial fair play contributes to the current landscape of the league and how some teams have abused their financial means for success over other teams(1), as well as how the league has seemingly lost a competitive edge due to the relative dominance of a couple of teams in recent years(2). These complex findings served as motivation for us to look at how the power index of teams has changed over the years, and what the relationship to financial means is with a given teams power index.
Introduction/Background
This dataset contained match results and associated “power rankings” through a Soccer Power Index (SPI) criteria across the top five European soccer leagues from the 2016-2022 seasons. For our project we decided to narrow down to just the English Premier League, the top flight of soccer in England. As avid soccer fans, this project was interesting for us to explore in a data setting because of the variety of metrics to explore that are not usually easily visible when watching a match. For example, we could look at how across multiple seasons, a team’s SPI increased or decreased and how many games the team won or lost. In addition to this, it is important to understand that in the English Premier League, teams finishing in the bottom three of the table at the conclusion of the season are relegated. This adds an interesting dynamic to explore with the SPI variable. Despite the variety of paths we could choose with this data, we decided to take an in-depth look at how the SPI metric impacted games and teams. Do teams with higher SPI scores always win more games in a season? Do teams with lower SPI scores win fewer games? Do teams with a higher average SPI score ever drop below the average SPI threshold across all teams?
Figure 1
Figure 2
We started this project by delving into the losing percentages for all Premier League across 7 seasons (Figure 1). This showed us that there are teams which on average, performed far worse than others across all seasons. For example, teams such as Norwich City, Sunderland, and Middlesborough are all teams with very high losing percentages in the Premier League. This makes sense as these are teams that are often at the bottom of the league and often get relegated at the end of the season. We then looked the distribution of SPI scores in the dataset to verify that we had a normal distribution of SPI scores across the dataset, with a median SPI value across all teams and seasons plotted for reference (Figure 2). This showed that the majority of SPI scores are roughly within the 60-80 range. SPI rating or score can be defined as the percentage of available points — a win is worth 3 points, a tie worth 1 point, and a loss worth 0 points — the team would be expected to take if that match were played over and over again.
Exploration of SPI
Figure 3
Figure 4
Figure 5
We then set out to look at the relationship between cumulative wins and losses impacted SPI over 7 seasons for teams, as well as if there was a relationship between the number of goals scored in a game and mean SPI over 7 seasons. The first plot (Figure 3) showed us that there is a positive linear relationship between the cumulative number of wins a team has and their mean SPI over 7 seasons. However, the relationship between mean SPI and cumulative number of losses (Figure 4) did not show a linear relationship. Looking at total losses plotted against SPI scores, there was a bell-curve shape to the plot. This trend was likely influenced by the presence of teams who often get promoted/relegated from the English Premier League, therefore they would have a very low SPI rating but a lower number of total losses then teams who are constantly in the Premier League but arent super succesful. To illustrate this idea, here is an example. A recently promoted team in the league may lose all their games across a season, be relegated, and then never appear in the league within the timespan of this dataset. In contrast, there might be teams in the league who consistently win just enough games to stay in the league, but as a result, also have low SPI scores. This results in more accumulated losses in total than the one season teams who crashed out, leading to this unexpected trend. The plot looking at goals per game vs mean SPI rating over 7 seasons (Figure 5) revealed a similar pattern to Figure 3, in that there was a positive linear relationship between the number of goals a team scored in a game and their mean SPI over 7 seasons.
What we gathered from these results is that a more effective measure of looking at team performance across all seasons is not by looking at cumulative wins or losses, but instead by looking at the relative win/loss percentage of teams.
Figure 6
Next, we wanted to look at the minimum and maximum SPI scores across all teams (Figure 6), and if teams which we know to have been historically successful have dropped below the average SPI score across all teams in the Premier League. We found that teams Arsenal, Manchester City, Manchester United, Chelsea, Liverpool, and Tottenham have not dropped below the average SPI threshold. This makes sense as they have historically been the most successful teams, and, as shown in the SPI analysis, make up the top six highest average SPI scores across the dataset. The teams that fall below the average were Cardiff City, Huddersfield Town, Hull City, Middlesborough, Norwich City, Nottingham Forest, Stoke City, Sunderland, Swansea, and West Bromwich Albion, which makes sense as these teams have been relegated at one point during the timespan of the data.
Figure 7
Then, we looked at how the transfer activity of Premier League teams has impacted the SPI of teams (Figure 7). For context, the data we scraped references 5 years worth of transfer activity. The way players get transfered from team to team is by the teams buying and selling them. Almost every team tends to spend more money buying players then they do selling players to other teams. The values we scraped represent the net gains and losses that a team made throughout the five year period, and all values we scraped (except for one) were negative which is why we labelled this figure as “net losses”. What we found is that, for the most part, teams that spent more than they made in the transfer market (higher net losses) had higher SPI, but some teams went against this generalization. For example, Newcastle has much higher net losses than any other comparable teams as far as SPI, but this is due to the club being recently bought by new ownership and new cashflow being injected into the team in terms of transfer activity. Additionally, the top two teams Liverpool and Manchester City do not have anywhere near the amount of net losses that the next top 4 teams have. This demonstrates the idea that not only is it the amount of money spent that is important, but also the overall strategy of recruitment and youth development.
Figure 8
Next, we wanted to look at how effectively the win probability variable predicted the outcome of the match winner in the dataset (Figure 8). What we found is that some teams are correctly predicted to win a match more than others. The teams which are more correctly predicted to win most of the time have a higher SPI (as seen in Figure 6),which leads us to believe that the probability variable combined with SPI is a fairly good metric of team success.
Finally, for our last visualization we decided to look at how over the course of 7 seasons SPI rating fluctuated for each team and how well it predicted for the winner of the league. What we found is that, in most cases, the winner of the league had a high mean SPI. We also found that SPI varied quite a bit across seasons for most teams, showing how dynamic the league is from year to year. The one exception to SPI being a determinant of potential league winner’s was in 2016, when Leicester City won the league. Going into that season, Leicester had just come off from a relegation battle the previous season and had odds of winning the league of 5,000 to 1, or a 0.02% chance of winning the league.
Limitations
The limitations of this project were the range of years of available data for the different datasets that we used. The dataset containing the SPI data had only 7 seasons from 2015-2022, and it would have been great to have had data from earlier years until the present. On a similar note, the data we scraped for transfer market net losses only had 5 years worth of data from 2016-2021, and similarly it would have been nice to have had data up until the present. An additional limitation of this dataset is that there are fairly few financial regulations on teams, especially in the earlier years of the dataset, which certainly skews the results we see and produces the “top six” effect of teams dominant financially and performance-wise.
Main Takeaways
This data exploration revealed that SPI is a fairly good metric of measuring a team’s performance in a season, especially when a team is in the top six of the premier league. This metric loses power as teams are lower in the premier league table, as shown in our proportion plot showing teams accurately predicted to win based on probability. Furthermore, teams with larger financial capacity for spending, or rather for having more net losses, are more likely to perform well save for a couple of outliers.
Future Directions
Future directions of this exploration could be to include some statistical tests on some of the relationships which we see. For example, can we predict with significance when a team will lose a match based on their financial capacity as well as their SPI? Or can we show statistical significance between the relationships which we displayed in this exploration? There are still many questions which can be analyzed in this dataset, and exploring the statistical power of the SPI variable has lots of potential for future research.
References
Staff, T. A. U. (n.d.). Premier League financial fair play rules explained. The Athletic. Retrieved April 25, 2024, from https://theathletic.com/5198425/2024/01/13/premier-league-ffp-financial-fair-play/
Eccleshare, C. (n.d.). With Manchester City’s continuing dominance, has the Premier League become a one-team division? The Athletic. Retrieved April 25, 2024, from https://theathletic.com/4520869/2023/05/16/premier-league-manchester-city-dominance/